Shannon’s Entropy of The Stochastic Context-Free Grammar and an Application to RNA Secondary Structure Modeling
نویسنده
چکیده
Stochastic context-free grammars (SCFG) have been used in RNA Secondary structure modeling. An SCFG consists of a set of grammar rules with probability for each. Given a grammar design, finding the best set of probabilities that yield optimum performance can be challenging. Although current Expectation Maximization (EM) MaximumLikelihood (ML)-based model training approaches have been effective, there is no guarantee that they provide parameter sets for the grammar to have optimum performance. In this work, An analytical measure of the SCFG space, denoted here as Grammar Space (GS) entropy, is introduced and calculated for various SCFG models in the literature. It is shown that more accurate models have lower GS entropy. Finally, based on the GS entropy, a novel RNA structure model training method is proposed.
منابع مشابه
Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling
Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars h...
متن کاملStochastic Context-Free Grammars and RNA Secondary Structure Prediction
This thesis focus on the prediction of RNA secondary structure using stochastic context-free grammars (SCFG). The RNA secondary structure prediction problem consists of predicting a 2-dimensional structure from a 1-dimensional nucleotide sequence. The theory behind SCFG is explained and an overview of the research literature on various methods in the field of secondary structure prediction is g...
متن کاملRNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars
A new method of discovering the common secondary structure of a family of homologous RNA sequences using Gibbs sampling and stochastic context-free grammars is proposed. Given an unaligned set of sequences, a Gibbs sampling step simultaneously estimates the secondary structure of each sequence and a set of statistical parameters describing the common secondary structure of the set as a whole. T...
متن کاملAn evolutionary algorithm for stochastic context-free grammar design, with applications to RNA secondary structure prediction
Stochastic Context-Free Grammars (SCFGs) have been used widely in modelling RNA secondary structure. They were motivated by the use of Hidden Markov Models (HMMs) in protein modelling (Krogh et al., (1993)). What was lacking in HMMs though, was the ability to model long range interactions which are necessary to provide an effective model for RNA secondary structure. Thus, SCFGs, as generalisati...
متن کاملStochastic modeling of RNA pseudoknotted structures: a grammatical approach
MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large...
متن کامل